-
Notifications
You must be signed in to change notification settings - Fork 98
Develop upstream sync 251224 #3170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Develop upstream sync 251224 #3170
Conversation
PiperOrigin-RevId: 846167560
…intExpression. Helps with narrowing down which constraints are unsat. There can be many constraints (e.g. WGMMA in Mosaic), and while debugging it's unclear which one is violated at a glance. As a follow up, we can also introduce names to each Constraint to make the identification even easier. PiperOrigin-RevId: 846168559
PiperOrigin-RevId: 846171859
PiperOrigin-RevId: 846173555
…TF normalization in emitters 0) Fix a bug (?) in normalization util when normalized dim contains a single dimension 1) Perform normalization OTF for Transpose emitter selection 2) Use normalized shape for unrolling decision in kLoop emitter 3) Use normalized shape to detect slow transposes in triton fusion rewriter PiperOrigin-RevId: 846191206
…t.cc This change updates custom_call_test.cc to dynamically register custom call targets and FFI handlers using the runtime-determined platform name (CUDA or ROCM). This replaces the use of static registration macros, allowing the tests to run correctly across different GPU platforms and the reference interpreter. This way we can avoid compile time branches like `#ifdef GOOGLE_CUDA` and similar. Also: 1. Converts usage of raw CUDA driver API functions to StreamExecutor functionality 2. Replaces some legacy CustomCalls by FFI 3. Converts the while test target to HloRunnerPjRt 4. Removes a test case from the Token tests with a nested type in the output type, since that's not supported by our PjRt implementation. PiperOrigin-RevId: 846196106
The `fd.Size()` check doesn't work when the file descriptor is invalid and only the path was given. PiperOrigin-RevId: 846207406
PiperOrigin-RevId: 846213195
PiperOrigin-RevId: 846214738
PiperOrigin-RevId: 846217449
PiperOrigin-RevId: 846221230
PiperOrigin-RevId: 846221752
The ROCm code path doesn't go through NcclCollectives anymore. Therefore these checks are obsolete. PiperOrigin-RevId: 846226180
PiperOrigin-RevId: 846226345
PiperOrigin-RevId: 846231902
PiperOrigin-RevId: 846234559
PiperOrigin-RevId: 846238886
This migrates `builder.create<Op>()` => `Op::create()` PiperOrigin-RevId: 846246070
This change moves the definition of `AotCompilationResult` into a new header file `compiled_module.h` and renames the class to `CompiledModule`. `CompilationResult` would have been the preferred name, but it's already in-use elsewhere. The original `AotCompilationResult` is kept as a deprecated alias. PiperOrigin-RevId: 846246415
…ests, rather than on the original dimensions. These are simpler both to write and to think about. No behavior changes are intended. PiperOrigin-RevId: 846253300
… its allocation later Imported from GitHub PR openxla/xla#35510 📝 Summary of Changes Initialize collectives pointer to nullptr 🎯 Justification Gpu runtime options are initialized in TF and transferred to XLA to execute thunks. Since the memory is not cleared collectives point to an uninitialized memory resulting in segfault during nccl collective initialization and operation. 🚀 Kind of Contribution Please remove what does not apply: 🐛 Bug Fix, Copybara import of the project: -- 2bfc6fbddbf2f9a926dd504169c56be45d2f1a0a by Harsha HS <[email protected]>: [ROCm] Initialze collectives to nullptr to force its allocation later Merging this change closes tensorflow#35510 PiperOrigin-RevId: 846266642
This migrates `builder.create<Op>()` => `Op::create()` PiperOrigin-RevId: 846268375
…utor_test. The local_defines for CUDA/ROCM are not required for this test. Added explicit includes for headers used in gpu_executor_test.cc. PiperOrigin-RevId: 846269233
Imported from GitHub PR openxla/xla#35482 Sometime json incorrectly parse compile commands from bazel, and we end up passing them as ``` "-isystem path/to/includes" ``` to `clangd`, and these flags parsed incorrectly Copybara import of the project: -- adf291e21b098d79fa3be4065ee02fafdf5c660a by Eugene Zhulenev <[email protected]>: Correctly generate compile_commands.json Merging this change closes tensorflow#35482 PiperOrigin-RevId: 846269357
Depending on the compiler, `testing::TempDir() + __FUNCTION__` may generate and invalid file name. PiperOrigin-RevId: 846275995
…iguous send/recv buffers Imported from GitHub PR openxla/xla#35463 With latest NCCL we can use `ncclAlltoall` API directly without having to launch grouped send and recv operations. Copybara import of the project: -- 0630f4d48049b211442dcb1754e521a4b1f37f7b by Eugene Zhulenev <[email protected]>: [xla:gpu] Support ncclAlltoall directly for contiguous send/recv buffers Merging this change closes tensorflow#35463 PiperOrigin-RevId: 846277559
…is supported by libraries. PiperOrigin-RevId: 846299624
We can add output pointer to StreamState and it will have all the information for rendezvour. No need to have a separate RendezvousValue struct. PiperOrigin-RevId: 846313928
For example if we have a fusion ``` dot bitcast1 ... bad_op ... bitcast2 ... ROOT root = ... ``` we can still benefit from sinking bitcast2 even though instructions between dot and bad_op will not change. PiperOrigin-RevId: 846314341
PiperOrigin-RevId: 848393091
PiperOrigin-RevId: 848423026
PiperOrigin-RevId: 848429925
PiperOrigin-RevId: 848434764
PiperOrigin-RevId: 848441651
…stub. The `xtile_compiler` target now acts as a selector, depending on either `xtile_compiler_impl` or `xtile_compiler_stub` based on whether CUDA or ROCm is configured. The full implementation is moved to the new `xtile_compiler_impl` target, while `xtile_compiler_stub` provides a minimal version for other configurations. This has the advantage that build_cleaner can run on xtile_compiler_impl. (Doing that removed around 20 dependencies) PiperOrigin-RevId: 848442213
PiperOrigin-RevId: 848455572
PiperOrigin-RevId: 848467225
PiperOrigin-RevId: 848467272
PiperOrigin-RevId: 848475361
It has to become a part of Compiler::CompilerOptions, but CompilerOptions should not depend on PJRT. So, moving it here. PiperOrigin-RevId: 848523186
PiperOrigin-RevId: 848534440
|
This test is failed, seems backend config ( |
aeda463 to
9135a29
Compare
9135a29 to
b28eff1
Compare
This is a deviceless test, the problem was in file path. Fixed in 3a69036 |
|
Hi @i-chaochen can we merge this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Yes, please merge it and be remember to push the tag.
Motivation
Bi-weekly sync from TensorFlow upstream
Disabled tests:
I reviewed old disabled UTs; some were enabled, and some were moved to the testing scripts excluded list. All details in https://github.com/ROCm/frameworks-internal/issues/14968
Submission Checklist